This model includes an unknown Bandit reward process with unrestricted switching times, whose random sampling times have a negative exponential distribution and sampling values have an Erlang ( 2) distribution. 在这个模型中,未知Bandit过程是抽样时间间隔服从负指数分布,抽样值服从Erlang(2)分布,允许在任意时刻跳转的Bandit报酬过程。